The Association between Inefficient Repair of DNA Double-Strand Breaks and Common Polymorphisms of the HRR and NHEJ Repair Genes in Patients with Rheumatoid Arthritis

Rheumatoid arthritis (RA) is an autoimmune disease characterized by chronic inflammation affecting up to 2.0% of adults around the world. The molecular background of RA has not yet been fully elucidated, but RA is classified as a disease in which the genetic background is one of the most significant risk factors. One hallmark of RA is impaired DNA repair observed in patient-derived peripheral blood mononuclear cells (PBMCs). The aim of this study was to correlate the phenotype defined as the efficiency of DNA double-strand break (DSB) repair with the genotype limited to a single-nucleotide polymorphism (SNP) of DSB repair genes. We also analyzed the expression level of key DSB repair genes. The study population contained 45 RA patients and 45 healthy controls. We used a comet assay to study DSB repair after in vitro exposure to bleomycin in PBMCs from patients with rheumatoid arthritis. TaqMan SNP Genotyping Assays were used to determine the distribution of SNPs and the Taq Man gene expression assay was used to assess the RNA expression of DSB repair-related genes. PBMCs from patients with RA had significantly lower bleomycin-induced DNA lesion repair efficiency and we identified more subjects with inefficient DNA repair in RA compared with the control (84.5% vs. 24.4%; OR 41.4, 95% CI, 4.8–355.01). Furthermore, SNPs located within the RAD50 gene (rs1801321 and rs1801320) increased the OR to 53.5 (95% CI, 4.7–613.21) while rs963917 and rs3784099 (RAD51B) to 73.4 (95% CI, 5.3–1011.05). These results were confirmed by decision tree (DT) analysis (accuracy 0.84; precision 0.87, and specificity 0.86). We also found elevated expression of RAD51B, BRCA1, and BRCA2 in PBMCs isolated from RA patients. The findings indicated that impaired DSB repair in RA may be related to genetic variations in DSB repair genes as well as their expression levels. However, the mechanism of this relation, and whether it is direct or indirect, needs to be elucidated.


Introduction
Rheumatoid arthritis (RA) is a chronic, inflammatory, and autoimmune rheumatic disease characterized by an improper autoimmune response.RA affects about 2% of the population worldwide.The likelihood of developing RA increases with age, but the disease can develop at any age [1].Women are much more likely to suffer from RA than men, a discrepancy that attempts to link to hormonal disorders, although there is no convincing evidence for this.Other factors related to the development of RA include environmental factors such as air pollution, lifestyle, including smoking, and genetic factors [2].The latter are believed to be the main contributors to the development of the disease mainly associated with the human leukocyte antigen (HLA) locus; however, other genes are also involved [3].Genome-wide association studies (GWAS) have not only identified genetic elements as risk factors for the development of RA but have also shown significant variation between ethnic groups [4].It is generally accepted that some of the genetic factors of RA are common to the human population; nevertheless, other studies show variation [5][6][7].
The molecular basis of RA is complex.The immune system of individuals with RA produces antibodies targeting multiple proteins that undergo a variety of post-translational modifications.This heterogeneity of the antibodies suggests multiple metabolic abnormalities within peripheral blood mononuclear cells (PBMCs).These abnormalities, as we and other authors have previously demonstrated, involve, among other processes, the PBMC response to DNA damage, including DNA repair [8][9][10][11][12].PBMCs isolated from subjects with RA are characterized by greater sensitivity to DNA damage agents than healthy subjects.Furthermore, the kinetics of repairing oxidative DNA damage and DNA double-strand breaks (DSBs) are impaired.The reason for these abnormalities in the repair of oxidative DNA damage is the altered expression and occurrence of different allelic forms of genes encoding repair proteins [13].Changes in DSB repair are attempted to be associated with reduced expression of the key protein ATM [9].To date, analysis of the function of DSB repair in RA has been rather superficial-carried out either on a small study sample or non-Caucasian ethnic groups and without analysis of the genetic background.Taking these facts into account, in the present study, we analyze the efficiency of DSB repair in a Caucasian population in correlation with the genotype, limited to the occurrence of single-nucleotide polymorphisms (SNPs) in the DSB repair genes.We confirmed the phenotype-genotype correlation using machine learning methods and also analyzed changes in the expression of key DSB repair genes at the mRNA level in PBMCs isolated from subjects with RA.

Characteristics of the Study Population
There were no significant differences in the distributions of age, sex, and smoking status between cases and controls.The mean duration of the disease was 14.9 ± 14.5 years (from 1 to 75 years).Twenty-three patients were currently (for at least one month before blood collection) receiving disease modifying anti-rheumatic drugs (DMARDs), including methotrexate and/or sulfasalazine, and twenty-two patients did not receive DMARDS within the last month.Glucocorticosteroids (GCS) were used for the treatment of twentytwo patients.Nine patients did not have rheumatoid factor levels (positive in 36 cases).We detected an anticyclic citrullinated peptide antibody (aCCP) in 32 patients.Additionally, we determined the level of C-reactive protein (CRP) (16.2 ± 20.5 g/dL).Disease activity has also been also assessed based on the disease activity score 28-joint count C-reactive protein (DAS28)-CRP score (DAS <1.7 was defined as remission-3 patients, DAS > 1.7 and <2.6 was defined as low disease activity-15 patients, and DAS28 above 5.1 as high disease activity-27 patients).All controls had CRPs within normal limits and no chronic disease with inflammatory background (Table 1).

Differences in DNA RepEff between RA Patients and Controls
PBMCs isolated from RA patients had a significantly lower efficiency of bleomycininduced DNA lesion repair as presented on Figure 1 (<0.0001).The median of RepEff in the control group was 78 vs. 46 calculated in the RA group.The Hodges-Lehmann estimation of the location shift was 26.33.Moreover, we identified more subjects with marginally efficient and inefficient DNA repair in RA as compared with the controls (43 vs. 22; Table 2).
Table 2. Efficiency of the repair of DNA lesions (RepEff) that induced by bleomycin in peripheral blood mononuclear cells (PBMCs) isolated from 45 healthy controls and 45 rheumatoid arthritis (RA) patients.The efficiency was calculated as follows: The DNA damage measured immediately after exposure to bleomycin was set as 100% of DNA damage.Next, the percentage of the repaired DNA after 120 min was measured.Moreover, we identified more subjects with marginally efficient and inefficient DNA repair in RA as compared with the controls (43 vs. 22; Table 2).The RA and control groups were almost similar in the genotype distribution of common polymorphisms located in key DNA double-strand break genes (Table 3).Using univariate logistic regression, we found an association of the two SNPs of the RAD51 (rs1801320, rs1801321) and RAD51B (rs963917, rs3784099) genes with RA.Regarding the aims of this manuscript, we paid special attention to these SNPs in regard to the correlation of the phenotype (RepEff) with the genotype (SNPs).

Associations between RepEff (Phenotype) and DSB SNPs (Genotype) and RA
To estimate RA risk, the relative RepEff was grouped into the quartile values of the controls (Table 2).The crude ORs for RA risk associated with the relative RepEff in the second group, third group and fourth group were 1.1 (95% CI, 0.06-19.6),5.45 (95% CI, 0.55-54.28)and 41.5 (95% CI, 4.84-355), compared with the first group (highly efficient DNA repair).After adjusting for the SNPs rs1801321, rs1801321, rs963917, and rs10483813, in the multivariate logistic regression analysis, the ORs of the RepEff increased in the fourth group corresponding to the no repair phenotype (Table 4).We have also tested possible correlations between RepEff and the clinical parameters of RA like DAS, RF, aCCP, and CRP; however, no correlations were found.

Differences in DSB Gene mRNA and miR-155 Expression Levels between RA Patients and Controls
We also observed the deregulation of the expression level of DSB genes in RA as opposed to the controls.A significant statistical difference in the level of gene expression was calculated for the RAD51B, BRCA1, and BRCA2 genes.The RA group shows higher gene expression levels in RAD51B, BRCA1, and BRCA2 than the control group (median of 0.00213 vs. 0.0069; 0.004 vs. 0.008; 0.00057 vs. 0.001; p < 0.05).No difference between the

Decision Tree (DT) Analysis
To identify the most important inputs for the DT classifier, which ensures the highest accuracy in recognizing RA patients, we applied a sequential feature selection procedure.It operates in two variants: forward and backward.Sequential forward feature selection (SFS) gradually adds one feature at a time to the selected subset, while sequential backward feature selection (SBS) gradually eliminates the features that contribute the least to

Decision Tree (DT) Analysis
To identify the most important inputs for the DT classifier, which ensures the highest accuracy in recognizing RA patients, we applied a sequential feature selection procedure.It operates in two variants: forward and backward.Sequential forward feature selection (SFS) gradually adds one feature at a time to the selected subset, while sequential backward feature selection (SBS) gradually eliminates the features that contribute the least to the model's performance from the initial set.Both variants explore different combinations of features and aim to find a subset of features that enhances the model's predictive ability while avoiding the inclusion of irrelevant or redundant features that may introduce noise or overfitting.Because SFS and SBS employ a greedy algorithm working in opposite directions, they usually lead to different results.
Figure 3 illustrates the process of searching the feature space using SFS and SBS.Note that SFS indicated x_1 (RepEff) as the most relevant feature, achieving an accuracy of 0.8000 when used as the sole feature in the DT model.In this case, DT uses only one node and can be expressed using the following simple decision rule: if x_1 = 3 then ŷ = 1 else ŷ = 0.By adding x_10 to the model, the accuracy increases to 0.8222.The resulting decision rule becomes: if x_1 = 3 then ŷ = 1 else (if x_10 = 1 then ŷ = 1 else ŷ = 0).Further, adding x_11 leads to an accuracy of 0.8444.The tree for this case is depicted in the left panel of Figure 3.The results obtained from SBS show rather low accuracy when all 30 features are included in the model (Acc = 0.6889).However, by progressively eliminating certain features (x_2, x_3, …, x_29), as illustrated in Figure 3, the accuracy gradually improves to 0.8556.Removing subsequent features, x_5, x_7, …, x_28, does not negatively impact the accuracy.Finally, the features shown in Table 5 were selected.The tree constructed using these features is shown in the right panel of  The results obtained from SBS show rather low accuracy when all 30 features are included in the model (Acc = 0.6889).However, by progressively eliminating certain features (x_2, x_3, . .., x_29), as illustrated in Figure 3, the accuracy gradually improves to 0.8556.Removing subsequent features, x_5, x_7, . .., x_28, does not negatively impact the accuracy.Finally, the features shown in Table 5 were selected.The tree constructed using these features is shown in the right panel of Figure 4.The results obtained from SBS show rather low accuracy when all 30 features are included in the model (Acc = 0.6889).However, by progressively eliminating certain features (x_2, x_3, …, x_29), as illustrated in Figure 3, the accuracy gradually improves to 0.8556.Removing subsequent features, x_5, x_7, …, x_28, does not negatively impact the accuracy.Finally, the features shown in Table 5 were selected.The tree constructed using these features is shown in the right panel of Figure 4.For comparison, Figure 5 shows the feature importance estimated using three standard methods.Chi2 examines whether each feature is independent of a response variable by using individual chi-square tests.The output score, -ln(p), is based on the p-value of the test statistic and expresses a strength of the relationship between the corresponding feature and the response variable.The minimum redundancy maximum relevance algorithm (MRMR) identifies an optimal set of features that are mutually dissimilar yet effectively represent the response variable.ReliefF feature scoring relies on detecting differences in feature values among nearest neighbor instance pairs.The algorithm penalizes features that yield different values for neighbors of the same class and rewards features with different values for neighbors of different classes.Unlike Chi2, MRMR and ReliefF are sensitive to feature interactions.

SNP Features Selected by SFS SNP Features Selected by SBS
For comparison, Figure 5 shows the feature importance estimated using three standard methods.Chi2 examines whether each feature is independent of a response variable by using individual chi-square tests.The output score, -( ), is based on the -value of the test statistic and expresses a strength of the relationship between the corresponding feature and the response variable.The minimum redundancy maximum relevance algorithm (MRMR) identifies an optimal set of features that are mutually dissimilar yet effectively represent the response variable.ReliefF feature scoring relies on detecting differences in feature values among nearest neighbor instance pairs.The algorithm penalizes features that yield different values for neighbors of the same class and rewards features with different values for neighbors of different classes.Unlike Chi2, MRMR and ReliefF are sensitive to feature interactions.As shown in Figure 5, all three methods highlight as the most important feature, aligning with the selections made by SFS and SBS.Additionally, , chosen using both SFS and SBS, receives high importance scores across all algorithms.However, , which obtains a high importance score in Chi2, MRMR, and ReliefF, was not selected using the sequential methods.In general, the results obtained from Chi2, MRMR, and ReliefF lack consistency and may not be definitive.It is important to note that SFS and SBS evaluate features within the context of a specific model, whereas Chi2, MRMR, and ReliefF do not consider the predictive model in their assessments.
Table 6 provides a comparison of DT performance using different input configurations: only , all available features, features selected by SFS, and features selected by SBS.The performance metrics were determined using leave-one-out cross-validation.The DT hyperparameters were determined through preliminary experiments and set as follows: split criterion-Gini's diversity index and maximum number of splits-20.The results in Table 6 highlight the significant discriminative power of as a standalone feature.When augmented with the features selected by sequential selection algorithms, the model's discriminative ability is further enhanced.However, it is important to note that including all As shown in Figure 5, all three methods highlight x 1 as the most important feature, aligning with the selections made by SFS and SBS.Additionally, x 11 , chosen using both SFS and SBS, receives high importance scores across all algorithms.However, x 22 , which obtains a high importance score in Chi2, MRMR, and ReliefF, was not selected using the sequential methods.In general, the results obtained from Chi2, MRMR, and ReliefF lack consistency and may not be definitive.It is important to note that SFS and SBS evaluate features within the context of a specific model, whereas Chi2, MRMR, and ReliefF do not consider the predictive model in their assessments.
Table 6 provides a comparison of DT performance using different input configurations: only x 1 , all available features, features selected by SFS, and features selected by SBS.The performance metrics were determined using leave-one-out cross-validation.The DT hyperparameters were determined through preliminary experiments and set as follows: split criterion-Gini's diversity index and maximum number of splits-20.The results in Table 6 highlight the significant discriminative power of x 1 as a standalone feature.
When augmented with the features selected by sequential selection algorithms, the model's discriminative ability is further enhanced.However, it is important to note that including all available features diminishes the model's accuracy.This clearly suggests that many of these features lack relevant information and should be eliminated from the model.

Study Groups
The study group included 45 patients diagnosed with RA hospitalized at the Department of Rheumatology, Medical University of Lodz, and in the outpatient clinic.A total of 45 healthy subjects without symptoms of chronic inflammatory conditions and cancer history in their closest relatives were selected as the control group.All RA patients met the European League Against Rheumatism/American College of Rheumatology (EULAR/ACR) 2010 diagnostic criteria for RA.The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Bioethics Committee of the Medical University of Lodz (Lodz, Poland) (no.RNN/07/18/KE, approved date: 16 January 2018) and informed consent was obtained from all subjects involved in the study.

PBMC Isolation
PBMCs were isolated from 9 mL of the peripheral blood of the study group in a density gradient using Lymphosep (Biowest, Nuaillé, France).Blood diluted 1:1 in PBS was gently applied to the Lymphosep and centrifuged for 15 min (400× g) at room temperature.The collected PBMCs (1 × 10 5 ) were washed 2 times in PBS.

Comet Assay
Assessment of endogenous DNA lesions as well as DNA lesions resulting from the exposure of PBMCs to bleomycin and DNA repair effectiveness analyses were performed using alkaline single-cell gel electrophoresis (comet assay).PBMCs treated with 25 µM bleomycin 30 min at 37 • C were suspended in low melting point (LMP) agarose (Sigma-Aldrich Corp., St. Louis, MO, USA) and applied to slides coated with normal melting point NMP agarose (Sigma-Aldrich Corp., St. Louis, MO, USA).To study DNA repair, the cells were allowed to recover for 120 min in fresh medium before suspension in LMP agarose.The prepared slides were incubated in the lysis buffer at pH 10 (2.5 m NaCl, 10 mm Tris, 100 mm EDTA) with 1% TritonX-100 (Sigma-Aldrich Corp., St. Louis, MO, USA) for 1 h at 4 • C.After the incubation, slides were left in development buffer (300 mm NaOH, 1 mm EDTA) for 20 min at 4 • C. The preparations were then electrophoresed in an electrophoresis buffer (30-mMNaOH, 1 mm EDTA) under the following conditions: 17 V, 32 mA, 20 min, RT.Slides were rinsed 3× with distilled water and fluorescently stained with DAPI.The stained "comets" were analyzed using a fluorescence microscope Nikon CI-L plus (Nikon, Tokyo, Japan) with Lucia software v.6.70 (Laboratory Imaging, Prague, Czech).
The individual DNA repair efficiency was calculated as shown previously [13].Briefly, we subtracted the percentage of DNA damage measured after 120 min of repair from the initial damage score set as 100%.Then, we set the repair ranks based on the repair efficiency quartiles of the control group.Quartile 4 means highly efficient DNA repair, quartile 3 efficient repair, quartile 2 marginally efficient DNA repair and quartile 1 no repair.

DNA Isolation
DNA was isolated from the peripheral blood of RA patients and controls using the GeneMatrix Blood DNA purification Kit (EURx, Gdansk, Poland).The peripheral blood of patients was lysed in the presence of a buffer containing proteinase K and chaotropic salts.Ethanol was then added to selectively bind the DNA to the membrane in the spin-column.After a short centrifugation, the DNA was bound to the membrane, while unbound impurities remained in the column effluent.In the next step, the samples were washed two times using wash buffer to remove the remaining contaminants from the membrane containing the DNA.Purified DNA was eluted with a low-salt buffer containing Tris-EDTA.The concentration and purity of the obtained DNA were evaluated spectrophotometrically by measuring absorbance at 260/280 nm on Synergy HT spectrophotometer (BioTek, Hong Kong, China).

RNA Expression of DSB Repair-Related Genes
We performed Taq Man gene expression assay (Thermo Fisher Scientific Inc., Waltham, MA, USA) to assess the expression profile of seven genes associated with DSB, respectively: BRCA2 (Hs01037416_m1), BRCA1 (Hs02387156_m1), ATM (Hs01112362_m1), RAD51B (Hs01568761_m1), RAD51 (Hs00947967_m1), PRKDC (Hs01016091_g1), and H2AFX (Hs0157 3336_s1).We used RPLP1 (Hs02926887_g1) as a reference gene.The 10 µL total reaction volume was 10 µL including: 1 µL cDNA, 1 µL Primers, 5 µL 2× TaqMan ® Universal Master Mix II, No UNG and 3 µL Nuclease free water.The conditions for the reaction were prepared according to the manufacturer's protocol for the TaqMan ® Universal Master Mix II, no UNG: polymerase activation (10 min, 95 • C), 30 cycles of denaturation (15 s, 95 • C), and annealing/extension (60 s, 60 • C).The qPCR reaction was performed in the Bio-Rad CFX96 system (BioRad, Hercules, CA, USA).Gene expression was calculated in relation to that of the reference genes (∆Ct sample = Ct target gene − Ct reference gene).Following, the relative mRNA expression was evaluated as a fold = 2 −∆Ct sample.Expression status was determined from the data regarding the expression value of the analyzed gene in the control patient population.

Statistical Analysis
The normal distribution of continuous variables was analyzed using the Shapiro-Wilk test.Descriptive data are expressed as median ± range according to lack of normal distribution.For the comparison of two groups (RA patients and healthy controls) the U Mann-Whitney rank sum test was used.The Hodges-Lehmann method was used to estimate the location shift.Multinomial logistic regression analyses were performed to calculate odds ratios (ORs) and 95% confidence intervals (CIs) for the effects of DNA repair status and other variables on RA.All variables included in the final multivariate models were determined to be independent by assessing their collinearity.Genotypes of DNA repair genes, were included as independent variables in univariate and multivariate multinomial logistic regression analyses.Only matching variables and factors that altered the ORs by 10% were included in the final multivariate models.The quality of the models was determined using the Hosmer-Lemeshow test.All statistical analyses were performed using TIBCO Statistica 13.3 (Palo Alto, CA, USA).In all tests, p value < 0.05 was used.

Decision Tree
A decision tree (DT) model is used in this study to recognize RA patients based on repair efficacy (RepEff) and genetic data, i.e., 29 SNP genes listed in Table 7.We define a sample representing a single patient or control as a tuple of 30 features, x = (x 1 , . . . ,x 30 ), where x 1 is RepEff and x 2 , . . ., x 30 correspond to the successive SNP genes.All the features are treated as categorical.We consider N = 90 samples of two equinumerous classes: RA and healthy.The output of the tree can is y = 0 (healthy) or y = 1 (RA).
The DT classifier is a powerful yet conceptually simple non-parametric algorithm that effectively partitions the feature space into smaller regions by recursively selecting the feature that offers the maximum information gain at each node.One of the key advantages of DT is its interpretability, allowing us to understand the underlying decision-making process.Moreover, this algorithm is versatile, capable of handling both numerical and categorical data, making it suitable for a wide range of applications.
We utilize the CART (classification and regression tree) implementation of the DT algorithm, as proposed by Breiman.The process of constructing a tree involves recursively partitioning a dataset into subsets based on the values of features.At each node, the algorithm selects the best feature to split the data, employing an impurity measure like the Gini index or entropy.This selection process is repeated recursively until a stopping criterion is met, such as reaching a predefined maximum number of splits.The resulting tree can then be utilized to predict the class of new samples by traversing from the root node to a leaf node corresponding to the predicted class.DT can be easily interpreted as a collection of simple rules that humans can readily understand.
The final decision of the DT classifier can be expressed as follows: where L represents the set of leaves, function label(l) assigns a label to a leaf l based on the subset of samples that reached that leaf, and function I(x ∈ l) returns 1 when sample x reaches the specific leaf l, and 0 otherwise.
Typically, the label assigned by function label(l) is the majority class within the subset that reached leaf l.The tests performed on the features in each node in our case has the following form: x j = v j , where j is the feature index selected individually for each node, and v j is a value selected from the domain of the j-th feature (e.g., A/A).

Discussion
This study appears to be the first to find a correlation between the less efficient repair of DSBs in PBMCs from patients with RA and SNPs within the RAD51 and RAD51B genes.The results on less efficient DSB repair are consistent with others [9][10][11] and appear in RA, and higher endogenous DSB levels were also found in PBMCs isolated from patients with systemic sclerosis and systemic lupus erythematosus (SLE) [14,15].Human cells repair DSBs by nonhomologous end joining (NHEJ) or homologous recombination (HR) and their variants [16].The selection of the appropriate system depends on many factors and is initiated by DNA end resection that creates single-stranded DNA overhangs.The initiation of the HR system is dependent on RAD51 and its paralogs blocking the availability of DNA overhangs for NHEJ proteins [17].RAD51B also has a different role in the repair of DSBs.It transduces the signal about DSBs to effector kinases like ATM, ATR, or DNA-PK and promotes the repair process [18].RAD51B polymorphisms were associated with rheumatoid arthritis and erosion in patients with rheumatoid arthritis patients [19,20].It seems that the allelic diversity of RAD51 and RAD51B may influence their role in the repair of DSBs and contribute to less efficient repair.We have also shown that RAD51B expression is elevated in PBMCs isolated from subjects with RA.RAD51B interacts with other RAD51 paralogs to form a complex whose role in DSB repair is to stabilize protein foci in DNA overhangs.Given our limited knowledge of how RAD51 paralogs function, it is difficult to definitively determine the potential mechanism underlying the decreased efficiency of DSB repair when RAD51B is overexpressed.However, increased levels of one paralog can alter the delicate balance of the RAD51 paralog complex and contribute to the destabilization of repair protein foci.We also observed the overexpression of BRCA1 and BRCA2.Both are known tumor suppressor genes, and the overexpression of some transcription variants is connected with delayed DSB repair [21].In contrast to earlier findings by Shao et al. [9], these results suggest that ATM expression is not down-regulated in RA.There may be two reasons for this discrepancy.First, Shao et al. [9] analyzed the expression only in T lymphocytes.Second, they had a completely different ethnic composition of the study group, where more than 70% of the subjects were African American.
There are two potential hypotheses that explain the disruption of repair processes in RA.The first is related to chronic inflammation and associated oxidative stress [22].Oxidative stress causes an increase in reactive oxygen species in the cell and increases the number of mutations in repair genes, for example, TP53 [23], resulting in decreased repair efficiency.The second hypothesis is related to a faster immune ageing process in RA [24].This results in the classic ageing T cell phenotype, which exhibits all the features of ageing metabolism, including reduced DNA repair efficiency.The two hypotheses are not mutually exclusive.They are coupled by a process called inflammaging, when ageing is accompanied by low-grade chronic inflammation [25].Our results perfectly fit both hypotheses, since genetic background understood by the presence of allelic forms of DNA repair genes can stimulate both the ageing process (through inefficient DNA repair) and increase the mutation rate through deficiency in repairing oxidative DNA lesions as a consequence of chronic stress.Another possible explanation for the potential impact of inefficient double-strand break repair caused by the abnormal expression of DNA repair genes and their genetic variation on the development of RA is the generation of diversity in adaptive immunity known as V(D)J recombination [26].The NHEJ system and the RAD51 protein are involved in this process [27,28].In summary, it involves the introduction of DNA breaks within the receptor and immunoglubin genes and their unfaithful repair.Errors in V(D)J recombination can lead to induction of genotoxic stress and accelerate the ageing of T cells.T cell ageing is now recognised as a risk determinant in autoimmune syndromes, including RA and is associated with the progression of RA [29].

Conclusions
The results of this study suggest that genetic variations in RAD51 and RAD51B genes contribute to the delayed/marginally efficient DSB repair phenotype in RA.We based our conclusion on logistic regression and confirmed it using machine learning and decision trees.Our study has some limitations.Although the SBS method identified four candidates in which SNPs can modulate DNA repair processes, in our opinion, more repetitive studies should be conducted on a larger study sample to confirm these relationships.Furthermore, functional work on the RAD51 SNP should be performed.It is important to functionally characterize the genetic variants of RAD51 and to find the biological mechanisms underlying the associations to assess the RAD51 SNP as a prognostic and/or predictive biomarker in RA.Potential clinical applications include helping to identify an individual's risk of RA and adjusting treatment for RA based on each patient's unique molecular profile.In our opinion, profiling patients with RA for DNA repair deficiency is particularly important for the earlier detection of individuals who may be at risk of developing lymphomas as a consequence of RA and deficient DNA repair.For this reason, future studies should cover a broader time frame taking into account the DNA repair profile of RA patients and follow them in terms of any cancer occurrence.Further studies should also be conducted on other than Caucasian ethnic groups to answer the question of whether the phenotype-genotype relationships we have found are a common feature of the human race or whether they are limited to the Caucasian ethnic group.

2. 2 .
Differences in DNA RepEff between RA Patients and Controls PBMCs isolated from RA patients had a significantly lower efficiency of bleomycininduced DNA lesion repair as presented on Figure 1 (<0.0001).The median of RepEff in the control group was 78 vs. 46 calculated in the RA group.The Hodges-Lehmann estimation of the location shift was 26.33.

Figure 1 .
Figure 1.Distribution of individual DNA repair lesions induced by bleomycin efficiency in peripheral blood mononuclear cells (PBMCs) isolated from 45 healthy controls (green) and 45 rheumatoid arthritis (RA, blue) patients.Data are presented as medians.Differences between groups were analyzed using the U Mann-Whitney rank sum test analysis, **** means p < 0.0001.

Figure 1 .
Figure 1.Distribution of individual DNA repair lesions induced by bleomycin efficiency in peripheral blood mononuclear cells (PBMCs) isolated from 45 healthy controls (green) and 45 rheumatoid arthritis (RA, blue) patients.Data are presented as medians.Differences between groups were analyzed using the U Mann-Whitney rank sum test analysis, **** means p < 0.0001.

Figure 2 .
Figure 2. Comparison of the expression levels in peripheral blood mononuclear cells (PBMCs) of the key DNA double strands break repair genes (A) RAD51B, (B) BRCA1, (C) BRCA2, (D) RAD51, (E) ATM, (F) PRKDC, and (G) γ -H2AX) between the controls (green) n = 45 and patients with rheumatoid arthritis (blue) n = 45.Data are presented as medians.Differences between the groups were analyzed using the Mann-Whitney rank sum test analysis, * means p < 0.05.

Figure 2 .
Figure 2. Comparison of the expression levels in peripheral blood mononuclear cells (PBMCs) of the key DNA double strands break repair genes (A) RAD51B, (B) BRCA1, (C) BRCA2, (D) RAD51, (E) ATM, (F) PRKDC, and (G) γ -H2AX) between the controls (green) n = 45 and patients with rheumatoid arthritis (blue) n = 45.Data are presented as medians.Differences between the groups were analyzed using the Mann-Whitney rank sum test analysis, * means p < 0.05.

Figure 3 .
Figure 3. Accuracy of the DT classifier in the successive steps of SFS ((a) left panel) and SBS ((b) right panel).

Figure 3 .
Figure 3. Accuracy of the DT classifier in the successive steps of SFS ((a) left panel) and SBS ((b) right panel).

Table 1 .
Characteristics of the study population.

Table 3 .
The allele and genotype frequency of the common polymorphisms located in key DNA double-strand break repair-related genes in control and rheumatoid arthritis (RA) groups.
Bold indicates statistically significant results; NA-not determined.

Table 4 .
Multivariate logistic regression analysis of the efficiency of the repair of DNA lesions induced by bleomycin and common polymorphisms located in DNA double-strand break repair-related genes in the control and rheumatoid arthritis (RA) groups.

Table 5 .
SNP features selected using SFS and SBS.

Table 5 .
SNP features selected using SFS and SBS.

Table 6 .
Performance of DT with different input configurations.

Table 7 .
Single-nucleotide polymorphisms analyzed in this study.